Take-home_Ex1: Geospatial Analytics for Public Good

Author

NeoYX

Published

November 24, 2023

Modified

November 30, 2023

Setting the Scene

The increasing digitization of urban infrastructures, such as public transportation and utilities, generates vast datasets using technologies like GPS and RFID. These datasets offer valuable insights into human movement patterns within a city, especially with the widespread deployment of smart cards and GPS devices in vehicles. However, current practices often limit the use of this data to basic tracking and mapping with GIS applications, primarily because conventional GIS lacks advanced functions for analyzing and modeling spatial and spatio-temporal data effectively. Enhanced analysis of these datasets could significantly contribute to better urban management and informed decision-making for both public and private urban transport services providers.

Objectives of Take-home_Ex1

Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, I will perform

  1. Geovisualisation of passengers trips by four different time intervals
    • With reference to the time intervals provided in the table below, compute the passenger trips generated by origin at the hexagon level,

      Peak hour period Bus tap on time
      Weekday morning peak 6am to 9am
      Weekday afternoon peak 5pm to 8pm
      Weekend/holiday morning peak 11am to 2pm
      Weekend/holiday evening peak 4pm to 7pm
    • Display the geographical distribution of the passenger trips by using appropriate geovisualisation methods,

    • Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).

  2. Apply appropriate Local Indicators of Spatial Association (GLISA) to undercover the spatial mobility patterns of public bus passengers in Singapore.
    • Compute LISA of the passengers trips generate by origin at hexagon level.

    • Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)

    • With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).

The Data

Aspatial data

For this take-home exercise, Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall will be used. It contains the number of trips by weekdays and weekends from origin to destination bus stops.

For this exercise, the data is collected in August 2023.

The fields are YEAR_MONTH, DAY_TYPE, TIME_PER_HOUR, PT_TYPE , ORIGIN_PT_CODE, DESTINATION_PT_CODE, TOTAL_TRIPS .

A sample row for bus dataset could be ‘2023-08, WEEKDAY, 16, BUS, 28299, 28009, 63’. TIME_PER_HOUR of 16 represents data is collected between 4 pm to 5pm.

Geospatial data

Two geospatial data will be used in this study, they are:

  • Bus Stop Location from LTA DataMall’s static dataset. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.

  • hexagon, a hexagon layer of 250m (perpendicular distance between the centre of the hexagon and its edges.) Each spatial unit is regular in shape and finer than the Master Plan 2019 Planning Sub-zone GIS data set of URA.

Getting Started

In this exercise, the following libraries will be used:

  1. sf packageto perform geoprocessing tasks

  2. sfdep package which builds upon spdep package (compute spatial weights matrix and spatially lagged variable for instance..)

  3. tmap to create geovisualisations

  4. tidyverse that supports data science, analysis and communication task including creating static statistical graphs.

  5. knitr to create html tables

  6. DT library to create interactive html tables

  7. ggplot2 to plot graphs

  8. plotly to plot interactive graphs\

pacman::p_load(sf, sfdep, tmap, tidyverse, knitr, DT, ggplot2, plotly, h3jsr, skimr)

Importing Data

Aspatial data

Import the Passenger volume by Origin Destination Bus Stops dataset downloaded from the LTA Datamall by using the read_csv() of the readr package.

odbus_aug <- read_csv("data/aspatial/origin_destination_bus_202308.csv")

Check the data fields

glimpse(odbus_aug)
Rows: 5,709,512
Columns: 7
$ YEAR_MONTH          <chr> "2023-08", "2023-08", "2023-08", "2023-08", "2023-…
$ DAY_TYPE            <chr> "WEEKDAY", "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 17, 7, 17, 14, 10, 10,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "4406…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "1722…
$ TOTAL_TRIPS         <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …

Processing the aspatial OD data

The ‘ORIGIN_PT_CODE’ and ‘DESTINATION_PT_CODE’ field is in character field. We will convert it to factor data type.

odbus_aug$ORIGIN_PT_CODE <- as.factor(odbus_aug$ORIGIN_PT_CODE)
odbus_aug$DESTINATION_PT_CODE <- as.factor(odbus_aug$DESTINATION_PT_CODE)

The function below will extract origin data based on the four time intervals required by the task. The expected arguments are

  1. daytype: ‘WEEKDAY’ or ‘WEEKENDS/HOLIDAY’
  2. timeinterval: c(8,10) if we want data from 8am to 11am.

The function will also compute the sum of all trips by ‘ORIGIN_PT_CODE’ for each time interval and stored under a new field called ‘TRIPS’.

get_origin <- function(daytype, timeinterval) {
  result <- odbus_aug %>%
    filter(DAY_TYPE == daytype) %>%
    filter(TIME_PER_HOUR >= timeinterval[1] & TIME_PER_HOUR <= timeinterval[2]) %>%
    group_by(ORIGIN_PT_CODE) %>%
    summarise(TRIPS = sum(TOTAL_TRIPS))
  
  return(result)
}

Let’s get the data using get_origin function

origin_day_am <- get_origin('WEEKDAY', c(6, 8))
origin_day_pm <- get_origin('WEEKDAY', c(5, 7))


origin_end_am <- get_origin('WEEKENDS/HOLIDAY', c(11, 13))
origin_end_pm <- get_origin('WEEKENDS/HOLIDAY', c(4, 6))

Take a look at overview of all the four dataframes.

glimpse(origin_day_am)
Rows: 5,005
Columns: 2
$ ORIGIN_PT_CODE <fct> 01012, 01013, 01019, 01029, 01039, 01059, 01109, 01112,…
$ TRIPS          <dbl> 1425, 683, 1146, 1707, 1752, 1106, 115, 5827, 6847, 257…
glimpse(origin_day_pm)
Rows: 4,965
Columns: 2
$ ORIGIN_PT_CODE <fct> 01012, 01013, 01019, 01029, 01039, 01059, 01109, 01112,…
$ TRIPS          <dbl> 752, 350, 436, 832, 823, 637, 305, 2918, 5127, 1091, 28…
glimpse(origin_end_am)
Rows: 4,994
Columns: 2
$ ORIGIN_PT_CODE <fct> 01012, 01013, 01019, 01029, 01039, 01059, 01109, 01112,…
$ TRIPS          <dbl> 1546, 1203, 1156, 2373, 3637, 806, 73, 10074, 5624, 446…
glimpse(origin_end_pm)
Rows: 4,623
Columns: 2
$ ORIGIN_PT_CODE <fct> 01012, 01013, 01019, 01029, 01039, 01059, 01109, 01112,…
$ TRIPS          <dbl> 119, 51, 45, 71, 138, 80, 66, 240, 275, 86, 395, 18, 78…

Geospatial data

First, we will import the Bus Stop Location shapefiles into R. The output will be a sf point object with 5161 points and 3 fields. As the raw data is in WSG84 geographical coordinate system, we will convert it to EPSG 3414, the projected coordinate system for Singapore.

busstop <- st_read(dsn="data/geospatial/BusStopLocation/BusStopLocation_Jul2023", layer = "BusStop") %>% 
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\yixin-neo\ISSS624_AGA\Take-home_Ex1\data\geospatial\BusStopLocation\BusStopLocation_Jul2023' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
busstop
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 10 features:
   BUS_STOP_N BUS_ROOF_N             LOC_DESC                  geometry
1       22069        B06   OPP CEVA LOGISTICS POINT (13576.31 32883.65)
2       32071        B23         AFT TRACK 13 POINT (13228.59 44206.38)
3       44331        B01              BLK 239  POINT (21045.1 40242.08)
4       96081        B05 GRACE INDEPENDENT CH POINT (41603.76 35413.11)
5       11561        B05              BLK 166 POINT (24568.74 30391.85)
6       66191        B03         AFT CORFE PL POINT (30951.58 38079.61)
7       23389       B02A              PEC LTD   POINT (12476.9 32211.6)
8       54411        B02              BLK 527 POINT (30329.45 39373.92)
9       28531        B09              BLK 536 POINT (14993.31 36905.61)
10      96139        B01              BLK 148  POINT (41642.81 36513.9)

Are there any duplicates in ‘BUS_STOP_N’ in busstop?

Checking for duplicates in the ‘BUS_STOP_N’ field reveals that there are about 16 repeated bus stop numbers. However, they have different geometry points in the simple feature busstop object above. These could be due to temprorary bus stops . We should retain all these rows as they might have different hexagon ‘fid’ values later.

busstop %>% 
  st_drop_geometry() %>% 
  group_by(BUS_STOP_N) %>%
  filter(n()>1) %>%
  ungroup() %>% 
  arrange(BUS_STOP_N)
# A tibble: 32 × 3
   BUS_STOP_N BUS_ROOF_N LOC_DESC            
   <chr>      <chr>      <chr>               
 1 11009      B04        Ghim Moh Ter        
 2 11009      B04-TMNL   GHIM MOH TER        
 3 22501      B02        Blk 662A            
 4 22501      B02        BLK 662A            
 5 43709      B06        BLK 644             
 6 43709      B06        BLK 644             
 7 47201      UNK        <NA>                
 8 47201      NIL        W'LANDS NTH STN     
 9 51071      B21        MACRITCHIE RESERVOIR
10 51071      B21        MACRITCHIE RESERVOIR
# ℹ 22 more rows

Take for instance bus stop number 51071 with two different point geometry values.

busstop[3470,]
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 28311.27 ymin: 36036.92 xmax: 28311.27 ymax: 36036.92
Projected CRS: SVY21 / Singapore TM
     BUS_STOP_N BUS_ROOF_N             LOC_DESC                  geometry
3470      51071        B21 MACRITCHIE RESERVOIR POINT (28311.27 36036.92)
busstop[3472,]
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 28282.54 ymin: 36033.93 xmax: 28282.54 ymax: 36033.93
Projected CRS: SVY21 / Singapore TM
     BUS_STOP_N BUS_ROOF_N             LOC_DESC                  geometry
3472      51071        B21 MACRITCHIE RESERVOIR POINT (28282.54 36033.93)

Creating hexagon layer

Before we can plot the base layer, we have to create hexagonal grids of 250 m using the busstop points data using sf.

First , create a grid which the extent equals to the bounding box of the busstop points using st_make_grid().

  1. To create hexagons of 250m (centre to edge), we should input 500 for ‘cellsize’ parameter. ‘Cellsize’ is defined as the distance from edge to edge.
  2. Convert to sf object and add a unique identifier to each hexagon grid. The output has 5580 hexagon units.
  3. Use st_intersects() to count the number of bus stops in each hexagon.
  4. Filter to retain only hexagons with at least one bus stop. The output bs_count has 1524 hexagon units.
area_hex_grid = st_make_grid(busstop,
                             cellsize= 500, 
                             what = "polygons", 
                             square = FALSE)

hex_grid_sf = st_sf(area_hex_grid) %>%
  mutate(grid_id = 1:length(lengths(area_hex_grid)))

hex_grid_sf$bs = lengths(st_intersects(hex_grid_sf, busstop))


bs_count = filter(hex_grid_sf, bs > 0)
bs_count
Simple feature collection with 1524 features and 2 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 3720.122 ymin: 26193.43 xmax: 48720.12 ymax: 53184.55
Projected CRS: SVY21 / Singapore TM
First 10 features:
                    area_hex_grid grid_id bs
1  POLYGON ((3970.122 27925.48...      34  1
2  POLYGON ((4220.122 28358.49...      65  1
3  POLYGON ((4470.122 30523.55...      99  1
4  POLYGON ((4720.122 28358.49...     127  1
5  POLYGON ((4720.122 30090.54...     129  2
6  POLYGON ((4720.122 30956.57...     130  1
7  POLYGON ((4720.122 31822.59...     131  1
8  POLYGON ((4970.122 28791.5,...     159  1
9  POLYGON ((4970.122 29657.53...     160  1
10 POLYGON ((4970.122 30523.55...     161  2

How does our hexagon layer look like?

plot(bs_count['bs'],
     main = 'Count of bus stops at hexagon level')

How does the distribution of bus stops in hexagon look like?

q <- quantile(bs_count$bs, probs = c(0.25, 0.5, 0.75))

ggplot(data=bs_count,
  aes(x=bs)) +
  geom_histogram(binwidth = 1, color='black',size= 0.3, fill = '#DD8888') +
  geom_vline(xintercept = q[2], linetype='dotted', size = 0.8, color='blue') +
  geom_vline(xintercept = q[3], linetype='dotted', size = 0.8) +
  annotate('text' , x= 2.3, y=450, label='50th \npercentile', size = 3) +
  annotate('text' , x= 4.7, y=450, label='75th \npercentile', size = 3) +
  labs(y= 'Count', x='Number of bus stops') +
  theme(axis.title.y=element_text(angle = 0)) +
  ggtitle('Distribution of Bus stops')

The plot above shows that distribution of the bus stop is right-skewed. Within a hexagon of 250m from the centre to edge, the median number of bus stops is about 3.

If we like to, we could save the above hexagon layer to disk

st_write(bs_count, 'data/bs_count.shp')

Geospatial data wrangling

Combining busstop (polygon sf) and bs_count (point sf)

We need to get the corresponding bus stop numbers in each hexagon in bs_count hexagon layer.

The code chunk below performs points and hexagon polygon overlap using st_intersection() and the output will be in sf point object.

Before overlapping:

busstop : 5161 points

bs_count hexagon base layer: 1524 hexagons

After overlapping:

busstop_hex : 5161 points and this is good results because all the bus stop location data is mapped to our bs_count base map.

busstop_hex <- st_intersection(busstop, bs_count) %>% 
  select(BUS_STOP_N, BUS_ROOF_N, LOC_DESC,  grid_id, bs)

busstop_hex
Simple feature collection with 5161 features and 5 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21 / Singapore TM
First 10 features:
     BUS_STOP_N BUS_ROOF_N            LOC_DESC grid_id bs
3269      25059        UNK   AFT TUAS STH BLVD      34  1
2570      25751       B02D BEF TUAS STH AVE 14      65  1
254       26379        NIL            YONG NAM      99  1
2897      25761        B03      OPP REC S'PORE     127  1
2827      25719       B01C           THE INDEX     129  2
4203      26389        NIL  BEF TUAS STH AVE 5     129  2
2403      26369        NIL        SEE HUP SENG     130  1
1565      26299        B13  BEF TUAS STH AVE 6     131  1
2829      25741        B03         HALLIBURTON     159  1
2828      25711       B02C       OPP THE INDEX     160  1
                      geometry
3269 POINT (3970.122 28063.28)
2570 POINT (4427.939 28758.67)
254  POINT (4473.292 30681.57)
2897 POINT (4737.082 28621.29)
2827 POINT (4799.476 30158.46)
4203 POINT (4776.694 30324.88)
2403 POINT (4604.363 31212.96)
1565 POINT (4879.489 31948.93)
2829  POINT (5060.733 29212.4)
2828 POINT (4831.438 30132.43)

We will drop the geometry because busstop_hex is a POINT sf object, there is no hex polygon geometry data for us to plot based on hexagon level. Furthermore, we have to process the attribute data. To get back the hexagon POLYGON geometry data, we can always left_join() bs_count df with our attribute table again later.

We will also use datatable() function of the DT library to print the data. The table is interactive and can perform basic sorting.

busstop_hex <- busstop_hex  %>% 
  st_drop_geometry()

datatable(busstop_hex, class = 'cell-border stripe', options = list(pageLength = 5))

Checking for duplicates in busstop_hex

Let us check for duplicates in busstop_hex df as it will be used to perform a left join later.

The output shows that there are 11 duplicated rows.

busstop_hex %>%
  group_by(BUS_STOP_N, grid_id, bs) %>%
  filter(n()>1) %>%
  ungroup()
# A tibble: 22 × 5
   BUS_STOP_N BUS_ROOF_N LOC_DESC        grid_id    bs
   <chr>      <chr>      <chr>             <int> <int>
 1 22501      B02        Blk 662A           1251     8
 2 22501      B02        BLK 662A           1251     8
 3 43709      B06        BLK 644            1904     7
 4 43709      B06        BLK 644            1904     7
 5 47201      UNK        <NA>               2381     2
 6 47201      NIL        W'LANDS NTH STN    2381     2
 7 11009      B04        Ghim Moh Ter       2395     7
 8 11009      B04-TMNL   GHIM MOH TER       2395     7
 9 58031      UNK        OPP CANBERRA DR    2939     7
10 58031      UNK        OPP CANBERRA DR    2939     7
# ℹ 12 more rows

Removal of duplicates

busstop_hex <- busstop_hex %>%
  distinct(BUS_STOP_N, grid_id, bs, .keep_all = TRUE)
#busstop_hex <- unique(busstop_hex)

Check again to be sure. All’s good.

busstop_hex[duplicated(busstop_hex), ]
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   grid_id    bs        
<0 rows> (or 0-length row.names)

Note that there will be repeated bus stop ids with different bus stop roof number / hexagon id.

busstop_hex['BUS_STOP_N'][duplicated(busstop_hex['BUS_STOP_N']), ]
[1] "53041" "52059" "68091" "68099" "67421"
Combining each of the four aspatial dataframes with busstop_hex

The function below performs left outer join for each of the four aspatial origin dataframes with busstop_hex dataframe.

The expected argument:

asp_df: name of aspatial origin dataframe like origin_day_am or origin_day_pm etc..

The function will also rename ‘ORIGIN_PT_CODE’ and ‘fid’ fields to ‘ORIGIN_BS’ and ‘ORIGIN_HEXFID’ respectively.

leftjoin <- function(asp_df) {
  result <- left_join(asp_df, busstop_hex,
                         by = c('ORIGIN_PT_CODE' = 'BUS_STOP_N')) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEXFID = grid_id) %>% 
  select(ORIGIN_HEXFID, 
         ORIGIN_BS, 
         TRIPS,
         BUS_ROOF_N,
         LOC_DESC)
  
  return(result)
}

Now, use the ‘leftjoin’ function to get our dataframes containing grid_id and origin bus stop ids.

origin_data_day_am <- leftjoin(origin_day_am) # 5005 to 5010 rows
origin_data_day_pm <- leftjoin(origin_day_pm) # 4965 to 4970 rows
origin_data_end_am <- leftjoin(origin_end_am) # 4994 to 4999 rows
origin_data_end_pm <- leftjoin(origin_end_pm) # 4623 to 4627 rows

The number of rows of each df increased by 4-5 after left join , this is due to the duplicated 5 bus stops ids , each with two distinct bus stop roofs and located in different hexagon ids, that originated from the busstop and busstop_hex dataframes earlier.

Again, double check for duplicates

origin_data_day_am['ORIGIN_HEXFID', 'ORIGIN_BS'][duplicated(origin_data_day_am['ORIGIN_HEXFID', 'ORIGIN_BS']), ]
# A tibble: 0 × 1
# ℹ 1 variable: ORIGIN_BS <chr>
origin_data_day_pm['ORIGIN_HEXFID', 'ORIGIN_BS'][duplicated(origin_data_day_pm['ORIGIN_HEXFID', 'ORIGIN_BS']), ]
# A tibble: 0 × 1
# ℹ 1 variable: ORIGIN_BS <chr>
origin_data_end_am['ORIGIN_HEXFID', 'ORIGIN_BS'][duplicated(origin_data_end_am['ORIGIN_HEXFID', 'ORIGIN_BS']), ]
# A tibble: 0 × 1
# ℹ 1 variable: ORIGIN_BS <chr>
origin_data_end_pm['ORIGIN_HEXFID', 'ORIGIN_BS'][duplicated(origin_data_end_pm['ORIGIN_HEXFID', 'ORIGIN_BS']), ]
# A tibble: 0 × 1
# ℹ 1 variable: ORIGIN_BS <chr>

Bus stops with more than 1 bus roof number and exists in more than 1 hexagon in the tibble below.

origin_data_day_am['ORIGIN_BS'][duplicated(origin_data_day_am['ORIGIN_BS']), ]
# A tibble: 5 × 1
  ORIGIN_BS
  <chr>    
1 52059    
2 53041    
3 67421    
4 68091    
5 68099    

Check for bus stops without any hexagon id

Using the skim() from the skimr package reveals that there are about 44-52 missing ‘ORIGIN_HEXFID’ values in each of our dataframes. (see Summary table below).

The code chunk below will

  • filter character or numeric field

  • filter for fields with missing values

  • select required columns

  • convert output to tibble df

  • rename the column name (static and dynamically)

  • combine all the output in a summary tibble df.

Summary table of number of missing ‘ORIGIN_HEXFID’ in each of the four dataframes.

Show the code
#str(skim(origin_data_day_am))

get_na_hex <- function(df, col2header) {
  result <- skimr::skim(df) %>%
  filter(skim_type == "character" | skim_type == "numeric") %>%
  filter(n_missing > 0) %>%
  select(skim_variable, n_missing)%>% 
  as_tibble() %>% 
  rename_with(~col2header, n_missing, .cols = c(n_missing)) %>%
  rename(Variable = skim_variable)
  
  return(result)
}


r1 <- get_na_hex(origin_data_day_am, 'Wkday_morn') 
r2 <- get_na_hex(origin_data_day_pm, 'Wkday_aft') 
r3 <- get_na_hex(origin_data_end_am, 'Wkend_morn') 
r4 <- get_na_hex(origin_data_end_pm, 'Wkend_aft') 

bind_cols(r1,r2,r3,r4) %>% 
  select(1,2,4,6,8) %>% 
  slice(3) %>% kable()
Variable…1 Wkday_morn Wkday_aft Wkend_morn Wkend_aft
ORIGIN_HEXFID 52 50 49 44

The reason for some bus stop ids without a hexagon id could be that the Bus Stop Location file is outdated (last update in July 2023) while the Passenger Volume by Origin Destination Bus Stops contains data in August 2023.

Bus stop ids without hexagon id

Show the code
datatable(origin_data_day_am %>%
  filter(is.na(ORIGIN_HEXFID)), 
  class = 'cell-border stripe', 
  options = list(pageLength = 5))
Show the code
datatable(origin_data_day_pm %>%
  filter(is.na(ORIGIN_HEXFID)), 
  class = 'cell-border stripe', 
  options = list(pageLength = 5))
Show the code
datatable(origin_data_end_am %>%
  filter(is.na(ORIGIN_HEXFID)), 
  class = 'cell-border stripe', 
  options = list(pageLength = 5))
Show the code
datatable(origin_data_end_pm %>%
  filter(is.na(ORIGIN_HEXFID)), 
  class = 'cell-border stripe', 
  options = list(pageLength = 5))

Aggregating total trips and average daily trips by hexagon level

Since we are plotting the number of passenger trips generated by hexagon level, we should aggregate the total trips by ‘ORIGIN_HEXFID’ and store these values in a new field called ‘TTRIPS’.

In addition, a new field ‘AVG_TTRIPS’ is calculated where it represents number of average weekday trip (=TTRIPS / 22) and average weekend trips (=TTRIPS/9). There are 22 weekdays , 8 days of weekends and 1 public holiday in August 2023.

After the left join, the total distinct of hexagons for each time intervals are 1491, 1489, 1499 and 1444. The total hexagon in bs_count was 1524. This suggests that were some bus stops with no passengers at different time intervals.

get_ttrips_wkday <- function(df) {
  result <- df %>% 
    group_by(ORIGIN_HEXFID) %>% 
    summarise(
      TTRIPS = sum(TRIPS),
      AVG_TRIPS = ceiling(sum(TRIPS) / 22),
      DESC = paste(LOC_DESC, collapse = ', ')
    ) %>% 
    ungroup()
  
  return(result)
}

get_ttrips_wkend <- function(df) {
  result <- df %>% 
    group_by(ORIGIN_HEXFID) %>% 
    summarise(
      TTRIPS = sum(TRIPS),
      AVG_TRIPS = ceiling(sum(TRIPS) / 9),
      DESC = paste(LOC_DESC, collapse = ', ')
    ) %>% 
    ungroup()
  
  return(result)
}

origin_data_day_am_hex <- get_ttrips_wkday(origin_data_day_am) # 5010 to 1491 rows
origin_data_day_pm_hex <- get_ttrips_wkday(origin_data_day_pm) # 4970 to 1489 rows
origin_data_end_am_hex <- get_ttrips_wkend(origin_data_end_am) # 4999 to 1499 rows
origin_data_end_pm_hex <- get_ttrips_wkend(origin_data_end_pm) # 4627 to 1444 rows

The four dataframes above will be printed out in the later sections.

Retrieve hexagon geometry coordinates

In order to plot choropleth maps, we need the geometry data from bs_countsf polygon object.

The function below performs a left join with bs_count and the attributes dataframes and also assign 0 to ‘TTRIPS’ and ‘AVG_TRIPS’ value in a hexagon without any passengers.

get_hexgeo <- function(df) {
  result <- left_join(bs_count, df,
                      by = c('grid_id' = 'ORIGIN_HEXFID')) %>%
    mutate(TTRIPS = if_else(is.na(TTRIPS),0, TTRIPS),
           AVG_TRIPS = if_else(is.na(AVG_TRIPS), 0, AVG_TRIPS))
    
  return(result)
}

day_am <- get_hexgeo(origin_data_day_am_hex)
day_pm <- get_hexgeo(origin_data_day_pm_hex)
end_am <- get_hexgeo(origin_data_end_am_hex)
end_pm <- get_hexgeo(origin_data_end_pm_hex)

Task 1: Geovisulisation and Analysis

In this section , the choropleth maps will be used to show the geographical distribution of the passenger trips by origin. I will be using ‘AVG_TRIPS’ field to plot instead of ‘’TTRIPS’ so that we can compare fairly on a daily basis.

Planning for map classification

To plan for the classification for the maps, we could check the distribution of the ‘AVG_TRIPS’ field across ALL four time intervals.

FIrst, combine all the AVG_‘TRIPS’ and create a new column ‘source’ to retain the name of the time interval.

origin_data_day_am_hex$source <- 'Wkday_am'
origin_data_day_pm_hex$source <- 'Wkday_pm'
origin_data_end_am_hex$source <- 'Wkend_am'
origin_data_end_pm_hex$source <- 'Wkend_pm'

combine <- rbind(origin_data_day_am_hex,
                 origin_data_day_pm_hex,
                 origin_data_end_am_hex,
                 origin_data_end_pm_hex)

Boxplots

Plot boxplots to get a general sensing of the distribution of ‘AVG_TRIPS’ across different time intervals. Hover the mouse above each point will show the average daily trips generated in each hexagon.

Show the code
ggplotly(ggplot(data=combine, 
       aes(y=AVG_TRIPS,
           x=source)) +
  geom_boxplot() +
  geom_point(stat="summary",        
             fun.y="mean",           
             colour ="red",          
             size=2) + 
  labs(x = "Time Intervals", y = "Total Trips", title='Daily average origin trips by time intervals') +
  theme_minimal() +
  theme(legend.key.size = unit(0.5,'cm'),
        legend.position="bottom",
        plot.title = element_text(size = 12,
                                  face='bold'),
        axis.title = element_text(size = 11 , face = "bold"),
        axis.text = element_text(size = 10),
        axis.text.x = element_text(angle = 0, hjust = 1)))

From the chart above, the weekday average daily ridership severely outweighs the weekend period.

Summary statistics

Final check on the summary statistics of TTRIPS before setting custom break points.

summary(combine$AVG_TRIPS)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0    19.0   122.0   411.1   473.0 15321.0 

Outliers

Because of the large number of outliers bus stops, let us calculate the upper bound of the outliers.

Q1 <- summary(combine$AVG_TRIPS)[2]
Q3 <- summary(combine$AVG_TRIPS)[5]
IQR_value <- Q3 - Q1

lower_bound <- Q1 - 1.5 * IQR_value
upper_bound <- Q3 + 1.5 * IQR_value

#lower_bound
upper_bound
3rd Qu. 
   1154 

Setting custom breaks

With reference to the box plot, summary statistics and upper bound values,

  • Min : 1 , Max : 13,321

  • quantile break points can be 20, 120, 470

  • break vector is therefore be c(0, 20, 120, 470, 1154, 10000, 13321)

The first three intervals reflects the 1st -3rd quantiles. The last two intervals would distinguish the outliers and extreme outliers (bus stops with very high daily passengers number) on the map later.

plotmap <- function(df, title) {
  df2<- df %>% top_n(10, AVG_TRIPS)
  
  result <- tm_shape(df)+
    tm_fill("AVG_TRIPS", 
            breaks =  c(0, 20, 120, 470, 1154, 10000, 13321), #style = "quantile", 
            palette = "Blues",
            alpha= 0.5,
            #legend.hist = TRUE, 
            #legend.is.portrait = TRUE,
            #legend.hist.z = 0.3,

            title = "Passengers Trip") +
    tm_layout(main.title = title,
              main.title.position = "center",
              main.title.size = 1.2,
              legend.height = 0.45, 
              legend.width = 0.35,
              #legend.outside = TRUE,
              #legend.text.size= 0.6,
              #inner.margins = c(0.01, 0.01, 0, .15),
              #legend.position = c("right", "top"),
              #bg.color = "black",
              #main.title.color = 'white',
              #legend.title.color = 'white',
              #legend.text.color= 'white',
              bg.color = "mintcream",
              frame = TRUE) +
    tm_borders(alpha = 0.5) +
    tm_compass(type="8star", size = 2) +
    tm_scale_bar() +
    tm_grid(alpha =0.2) +
    tm_credits("Source: Passenger Bus origin and destination data and Bus Stop Location data from LTA datamall", 
             position = c("left", "bottom")) +
    tm_shape(df2) +
    tm_bubbles(size='AVG_TRIPS', col='red',alpha=0.5, scale=.6)
  
  return(result)
}
p1 <- plotmap(day_am, 'Weekday-Morning (Daily Average) \nPassenger Trips generated at Hex lvl')
p2 <- plotmap(day_pm, 'Weekday-Afternoon (Daily Average) \nPassenger Trips generated at Hex lvl')
p3 <- plotmap(end_am, 'Weekend-Morning (Daily Average) \nPassenger Trips generated at Hex lvl')
p4 <- plotmap(end_pm, 'Weekend-Afternoon (Daily Average) \nPassenger Trips generated at Hex lvl')

Interactive DataTables

We will print the datatables here for the ease of counter-checking the interesting hexagon-ids spotted in the maps later.

Show the code
datatable(origin_data_day_am_hex %>% 
            arrange(desc(TTRIPS)), 
          class = 'cell-border stripe', options = list(pageLength = 5))
Show the code
datatable(origin_data_day_pm_hex %>% 
            arrange(desc(TTRIPS)),
          class = 'cell-border stripe', options = list(pageLength = 5))
Show the code
datatable(origin_data_end_am_hex %>% 
            arrange(desc(TTRIPS)),
          class = 'cell-border stripe', options = list(pageLength = 5))
Show the code
datatable(origin_data_end_pm_hex %>% 
            arrange(desc(TTRIPS)),
          class = 'cell-border stripe', options = list(pageLength = 5))

Map of Weekday-Morning (Daily Average) Passenger Trips generated at Hex lvl

The tmap_mode is set to ‘view’ mode. Clicking on each hexagon will show the hex-id and the ‘AVG_TRIPS’ field. With the hex-id, we could search for the bus stop description using the interactive data tables above.

Show the code
tmap_mode('view')
#ttm()
p1

Map of Weekday-Afternoon (Daily Average) Passenger Trips generated at Hex lvl

Show the code
p2

Map of Weekend-Morning (Daily Average) Passenger Trips generated at Hex lvl

Show the code
p3

Map of Weekend-Afternoon (Daily Average) Passenger Trips generated at Hex lvl

Show the code
p4

Facet View

Let us plot all four maps side-by-side for easy comparison.

ttm()
tmap_arrange(p1,p2,p3,p4,
             asp=2, nrow = 2, ncol = 2)

Discussion

The top row shows the average daily Weekday morning and afternoon passenger trips while the bottom row shows the Weekend morning and afternoon ridership generated at hexagon level. The classification method used is manual; the first three intervals represents the 1st, 2nd and 3rd quantile; and the last two intervals represents all the outliers and extreme outliers. In addition, the red bubbles represent the hexagons with the top 10 average daily passengers trip in each time-interval. We can observe the following:

  • Average daily ridership generation is the greatest on weekday morning, followed by weekday afternoon and weekend morning (cannot distinguish between these two at one glance). Lastly, weekend afternoon registers the least ridership volume.

  • Observed that the passengers trips generated at the Tuas area is very low as compared to the rest.

  • At first glance, the top two hexagons with the highest origin ridership volume are hex id 1251 and hex id 2411. They are both consistently in the top 2 origin bus stop for all four time intervals. The bus stops (refer to the interactive data table above) that fall within these hexagons are

    • hex id 1251: BOON LAY INT, BLK 662D, OPP BLK 662C, Blk 662A, BLK 664C

    • hex id 2411: WOODLANDS REG INT, W’LANDS CIVIC CTR, OPP W’LANDS CIVIC CTR, BEF W’LANDS STN EXIT 7, W’LANDS STN EXIT 5, W’LANDS STN EXIT 4, BLK 894C, BLK 515

  • There are four hexagon ids that are consistently in the top 10 origin bus stops for all four time intervals. They are

    • Hex id 3239: ANG MO KIO INT, BLK 322, AFT ANG MO KIO STN EXIT A, BLK 422, AFT ANG MO KIO INT, ANG MO KIO STN

    • Hex id 4539: TAMPINES INT, OUR TAMPINES HUB, OPP OUR TAMPINES HUB, UOB TAMPINES CTR, OPP CENTURY SQ, Tampines Stn Exit D, Aft Tampines Stn Exit E

    • Hex id 4349: BEDOK BUS INT, BEDOK STN EXIT B, BEDOK STN EXIT A, OPP PANASONIC, PANASONIC

  • The two hexagons that falls within top 10 in the weekdays but fall out of top 10 during weekends are

    • hex id 909: BEF JOO KOON INT, OPP JOO KOON INT, MOLEX, OPP MOLEX, OPP FAIRPRICE HUB, JOO KOON INT

    • hex id 2054: CLEMENTI STN, CLEMENTI STN, BLK 431

  • Interestingly, the hexagon within top 10 on weekday and weekend mornings and fall out of this top 10 position on weekday and weekend afternoons is id 3204. Suggesting that this region is more busy in the morning than afternoon.

    • hex id 3204 : TOA PAYOH BUS INT, OPP TOA PAYOH STN, TOA PAYOH STN, BLK 138B, BLK 84B, BLK 79C
  • Across all four time-intervals, there appears to be clusters of High-high and low-low as well as outliers of Low-High and High-Low in our area of study. However , we cannot be sure until we run the local Moran’s I statistical test.

Task 2: Local Indicators of Spatial Association (LISA) Analysis

In this section, we will use LISA statistics to identify existence of cluster or outlier for a given variable, in our case, the total number of trips generated at hexagon layer. We will apply the Moran’s I statistic to detect cluster and / or outliers for total trips generated at hexagon layer.

Computing Distance-Based Spatial Weights Matrix

Before we can compute the global/local spatial autocorrelation statistics, we need to construct a spatial weights of our study area. The spatial weights is used to define who are the neighbours of the spatial units (hexagons) in our study are. There are a few general rules:

  • Features should have at least one neighbour, each feature should not be a neighbour with all other features.

  • If data is skewed, each feature should have at least 8 neighbours.

Contiguity or Distance-based method?

We will rule out the contiguity methods because as seen in the geovisualisation above, we can see that the location of bus stops are rather ‘sparse’ in some regions like the central catchment areas, military training areas and airports, resulting in gaps in between groups of hexagons. Therefore, we are more inclined towards distance methods.

There are two main types of distance-based methods, namely:

  1. Adaptive distance-based (Fixed number of neighbours, eg, knn=18)

    This is our choice as our data is highly skewed to the right and also each hexagon will be guaranteed at least n neighbours, which makes it useful when testing for statistical significance later. To illustrate this, we can visualise the neighbours network in a cropped area of our study area.

The code chunk below first determine the rectangular extent of our study area using st_bbox() of sf package. Next, it adjusts the new bounding parameters. Finally we will use the st_crop of sf package to crop the day_am sf object. The resulting cropped map is a grassland bounded by Gambas Ave, Woodlands Ave 12, Seletar Expressway and Mandai Ave.

Show the code
bbox_new <- st_bbox(day_am)
#    xmin     ymin     xmax     ymax 
#18720.12 31193.43 33720.12 43184.55 
bbox_new[1] <- bbox_new[1] + 20000  # xmin
bbox_new[2] <- bbox_new[2] + 15000  # ymin
bbox_new[3] <- bbox_new[3] - 20000  # xmax
#bbox_new[4] <- bbox_new[4] - 5000  # ymax
bbox_new
    xmin     ymin     xmax     ymax 
23720.12 41193.43 28720.12 53184.55 
Show the code
cropped <- st_crop(day_am, bbox_new) %>% 
  rename(geometry=area_hex_grid)

Next, we will use the st_centroid(), knearneigh() and knn2nb() of spdep package to calculate the centroids coordinates of cropped map. If we use knn = 8 , the neighbours network will look like diagram below. (In this exercise, we will use knn=18 but we will not plot it as the network diagram will look to dense.)

Show the code
library(spdep)
longitude <- map_dbl(cropped$geometry, ~st_centroid(.x)[[1]])
latitude <- map_dbl(cropped$geometry, ~st_centroid(.x)[[2]])
coords <- cbind(longitude, latitude)
knn8 <- spdep::knn2nb(knearneigh(coords, k=8))
plot(st_geometry(cropped), border = 'lightgray', main='Grassland bounded by Gambas Ave, \nWoodlands Ave 12, Seletar Expressway and Mandai Ave')
plot(knn8, coords, pch=18, cex=0.6, add= TRUE, col='red')

  1. Fixed-distance threshold

    The reason why this method is less appropriate is because hexagons near to water-bodies/grassland/airports will have less neighbours. This will result in some hexagons having less than 8 neighbours. From the diagram below, we can see that if fixed-distance (green circle) is used, the highlighted hexagon beside the grassland will have only 2 neighbours as compared to the highlighted hexagon above, which will have 6 neighbours.

Identify adaptive distance weights

For this exercise, we will use knn =18. We will set each hexagon to have 18 neighbours each using the code chunk below.

  • st_knn() of sfdep will output a list of neighbours ‘nb’ of a hex

  • based on ‘nb’ column, st_weights() of sfdep will generate row-standardised spatial weights

  • ‘.before = 1’ will put the two columns at the front of our sf dataframe.

get_weights <- function(df) {
  result <- df %>% 
    mutate(nb = st_knn(area_hex_grid,
                       k=18),
           wt = st_weights(nb,
                           style='W'),
           .before =1)
  return(result)
}
wm_day_am <- get_weights(day_am)
wm_day_pm <- get_weights(day_pm)
wm_end_am <- get_weights(end_am)
wm_end_pm <- get_weights(end_pm)

Taking a sneak peak at one of the spatial weights matrix above

kable(head(wm_day_am,3))
nb wt grid_id bs TTRIPS AVG_TRIPS DESC area_hex_grid
2, 3, 4, 5, 6, 8, 9, 10, 12, 13, 16, 17, 22, 23, 24, 30, 38, 39 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556 34 1 61 3 AFT TUAS STH BLVD POLYGON ((3970.122 27925.48…
1, 3, 4, 5, 6, 8, 9, 10, 12, 13, 16, 17, 22, 23, 24, 30, 38, 39 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556 65 1 43 2 BEF TUAS STH AVE 14 POLYGON ((4220.122 28358.49…
5, 6, 7, 9, 10, 12, 13, 14, 16, 17, 18, 23, 24, 25, 30, 31, 32, 40 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556, 0.05555556 99 1 43 2 YONG NAM POLYGON ((4470.122 30523.55…

Global autocorrelation of spatial association (Global Moran’s I with simulation)

Global spatial association assesses the overall spatial pattern of a variable ‘TTRIPS’ across the entire study area. It provides a single value or metric that summarizes the extent to which similar values cluster together or are dispersed across the entire geographic space.

Set seed to ensure that the computation is reproducible.

set.seed(1234)

Global Moran’s I for weekday morning passenger trips generated by hexagon level. Simulated data is used as we do not assume normality and randomization. The number of simulations is 99+1 = 100.

We can use the global_moran_perm() function of the sfdep package to do it. It all starts with the wm_day_am spatial weights matrix.

global_moran_perm(wm_day_am$TTRIPS,
                       wm_day_am$nb,
                       wm_day_am$wt,
                  nsim = 99)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.17009, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

Global Moran’s I for weekday afternoon passenger trips generated by hexagon level.

global_moran_perm(wm_day_pm$TTRIPS,
                       wm_day_pm$nb,
                       wm_day_pm$wt,
                  nsim = 99)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.18789, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

Global Moran’s I for weekend morning passenger trips generated by hexagon level.

global_moran_perm(wm_end_am$TTRIPS,
                       wm_end_am$nb,
                       wm_end_am$wt,
                  nsim = 99)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.13575, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

Global Moran’s I for weekend afternoon passenger trips generated by hexagon level.

global_moran_perm(wm_end_pm$TTRIPS,
                       wm_end_pm$nb,
                       wm_end_pm$wt,
                  nsim = 99)

    Monte-Carlo simulation of Moran I

data:  x 
weights: listw  
number of simulations + 1: 100 

statistic = 0.17146, observed rank = 100, p-value < 2.2e-16
alternative hypothesis: two.sided

For all the four time intervals, since p-value for global Moran’s I is smaller than 0.05 , we can reject the null hypothesis that the spatial patterns is random. Because the Moran’s I statistics is greater than 0, we can infer the spatial distribution shows sign of clustering for all four time intervals. This result is rather consistent with the choropleth maps plotted earlier.

Local autocorrelation of spatial association

Local spatial association examines the spatial patterns at a more detailed, local level. Instead of providing a single summary value for the entire study area, local measures identify specific areas where the spatial association is particularly strong or weak. If the Local Moran’s I is positive, it suggests clusters of high-high or low-low. If negative, it suggests outliers of low-high or high-low.

Compute LISA of the passengers trips generated by origin at hex level

We can use the local_moran() function of the sfdep package to do so, It all starts with the wm_day_am spatial weights matrix, this function automatically compute the neighbour lists and weights using simulated data. As this function outputs results in a group format, we will need to unnest() in order to access the output.

get_lisa<- function(wm){
  result<- wm %>% 
  mutate(local_moran = local_moran(
    TTRIPS, nb,wt, nsim=99),
    .before=1) %>%
  unnest(local_moran)
  
  return(result)
}

lisa_day_am <- get_lisa(wm_day_am)
lisa_day_pm <- get_lisa(wm_day_pm)
lisa_end_am <- get_lisa(wm_end_am)
lisa_end_pm <- get_lisa(wm_end_pm)

New columns are added to the lisa_day_am simple feature dataframe smartly. The new columns are

  • ii: local moran statistic

  • eii: expectation of local moran statistic; for localmoran_permthe permutation sample means

  • var_ii: variance of local moran statistic; for localmoran_permthe permutation sample standard deviations

  • z_ii: standard deviate of local moran statistic; for localmoran_perm based on permutation sample means and standard deviations

  • p_ii: p-value of local moran statistic using pnorm(); for localmoran_perm using standard deviatse based on permutation sample means and standard deviations

  • p_ii_sim: For localmoran_perm(), rank() and punif() of observed statistic rank for [0, 1] p-values using alternative=

  • p_folded_sim: the simulation folded [0, 0.5] range ranked p-value

  • skewness: For localmoran_perm, the output of e1071::skewness() for the permutation samples underlying the standard deviates

  • kurtosis: For localmoran_perm, the output of e1071::kurtosis() for the permutation samples underlying the standard deviates

  • mean: contains the quadrant ‘type’ of a typical Moran Scatterplot

  • median: contains the quadrant ‘type’ of a typical Moran Scatterplot (use this if dataset is highly skewed).

Results of LISA computation for all four time intervals

datatable(lisa_day_am, class = 'cell-border stripe', options = list(pageLength = 5))
datatable(lisa_day_pm, class = 'cell-border stripe', options = list(pageLength = 5))
datatable(lisa_end_am, class = 'cell-border stripe', options = list(pageLength = 5))
datatable(lisa_end_pm, class = 'cell-border stripe', options = list(pageLength = 5))

Visualising significant Local Moran’s I at 95% confidence level

We will use tmap core functions to build choropleth maps, using the Local Moran’s I field and p-value field for all four time intervals.

Only the significant values of Local Moran’s I values at 95% confidence level are plotted.

get_sig_lmi_map <- function(lisatable, title) {
  
  sig_lisatable <- lisatable  %>%
  filter(p_ii_sim < 0.05)
  
  result <- tm_shape(lisatable) +
    tm_polygons() +
    tm_borders(alpha = 0.5) +
    tm_shape(sig_lisatable) +
    tm_fill("ii") + 
    tm_borders(alpha = 0.4) +
    tm_layout(main.title = title,
              main.title.size = 1.3)
  
  return(result)
  
}

sig_lmi_1 <- get_sig_lmi_map(lisa_day_am,"Local Moran's I of Total Trips generated on Weekday Morning at 95% CL" )
sig_lmi_2 <- get_sig_lmi_map(lisa_day_pm, "Local Moran's I of Total Trips generated on Weekday Afternoon at 95% CL" )
sig_lmi_3 <- get_sig_lmi_map(lisa_end_am, "Local Moran's I of Total Trips generated on Weekend Morning at 95% CL" )
sig_lmi_4 <- get_sig_lmi_map(lisa_end_pm, "Local Moran's I of Total Trips generated on Weekend Afternoon at 95% CL" )

We will plot them side-by-side for easy comparison.

Show the code
tmap_mode('plot')

tmap_arrange(sig_lmi_1,
              sig_lmi_2,
              sig_lmi_3,
              sig_lmi_4,
              asp=2,
              nrow=2,
              ncol=2)

From the Local Moran choropleth maps above, orange regions would indicate outliers regions however, we would not know whether they are low-high or high-low regions.

The green regions would indicate clusters however we would not know whether they are high-high or low-low regions. One thing for sure, the green Tuas region represents low-low based on the geovisualisation in the previous section.

To find out, we have to plot the LISA maps in the next section.

Visualizing significant LISA map at 95% confidence level

The LISA maps that we are building now are categorical map showing outliers (Low-high or High-low) and clusters (high-high or low-low).

We will use median values to generate the quadrants (HH, LH, HL or LL) because our data is highly skewed to the right, otherwise we can use the mean values.

get_sig_lisa_map <- function(lisatable, title) {
  
  sig_lisatable <- lisatable  %>%
  filter(p_ii_sim < 0.05)
  
  result <- tm_shape(lisatable) +
    tm_polygons(alpha = 0.5) +
    tm_borders(alpha = 0.5) +
    
    tm_shape(sig_lisatable) +
    tm_fill("median",
            palette = c("#2c7bb6",  "#fdae61", "#abd9e9", "#d7191c"),
            alpha= 0.7) + 
    tm_dots('grid_id', alpha=0.05) +
    tm_borders(alpha = 0.4) +
    tm_layout(main.title = title,
              main.title.size = 1.5,
              legend.position = c("left", "top"))
    
  return(result)
  
}

sig_lisa_1 <- get_sig_lisa_map(lisa_day_am,"LISA categories generated on Weekday Morning at 95% CL" )
sig_lisa_2 <- get_sig_lisa_map(lisa_day_pm, "LISA categories generated on Weekday Afternoon at 95% CL" )
sig_lisa_3 <- get_sig_lisa_map(lisa_end_am, "LISA categories generated on Weekend Morning at 95% CL" )
sig_lisa_4 <- get_sig_lisa_map(lisa_end_pm, "LISA categories generated on Weekend Afternoon at 95% CL" )

LISA categories generated on Weekday Morning at 95% CL

To leverage on the interactivity of the map, tm_dots() are used to plot the hexagon’s grid_id on the map for identification of each hexagon.

Clicking on the center of the significant hexgons will reveal their grid_id for easy identification of the bus stops that fall within the hexagon. (Use the interactive datatable above to search for hexagon grid id).

Show the code
tmap_mode('view')
sig_lisa_1

LISA categories generated on Weekday Afternoon at 95% CL

Show the code
tmap_mode('view')
sig_lisa_2

LISA categories generated on Weekend Morning at 95% CL

Show the code
tmap_mode('view')
sig_lisa_3

LISA categories generated on Weekend Afternoon at 95% CL

Show the code
tmap_mode('view')
sig_lisa_4

Plotting all four LISA maps side by side for easy comparison

Show the code
tmap_mode('plot')

tmap_arrange(sig_lisa_1,
              sig_lisa_2,
              sig_lisa_3,
              sig_lisa_4,
              asp=2,
              nrow=2,
              ncol=2)

Discussion

The four maps (different time-intervals) above shows significant clusters and outliers for passengers trips generate by origin bus stops at each hexagon. The confidence level used is 95%.

Blue region: Low-Low cluster

Light-blue region: Low-High outlier

Orange region: High-Low outlier

Red region: High-High cluster

We observe that passengers trips by origin is not homogeneous throughout Singapore, our study area.

Analysis by regions

At one glance, the spatial pattern is observed to be higher in the North, East and West residential regions than in the South part of Singapore.

The high clusters are Boon Lay interchange, Woodlands interchange, Yishun Interchange, Ang Mo kio Interchange, Seng Kang/ Hougang, Tampines Interchange Bedok Bus Interchange, Toa Payoh Interchange, Clementi Station, Chua Chu kang Interchange, Bukit Batok Interchange and Joo Koon Interchange .

The western parts of Singapore is a cluster of low passenger trips. They are the Tuas, Pioneer Sector, Gul Circle, Shipyard, Samulun, Jurong port, Penjuru Crescent and Pandan subzones. These are mainly industrial zones and workers are usually provided with company transport.

Analysis of the edges

The edges of Singapore tend to be clusters of low passenger trips, with the exception of a few outliers edges, namely:

  • hex id 194 (Western edge High-low): AFT TUAS STH AVE 4, BEF TUAS STH AVE 4

  • hex id 353 (Western edge High-Low): OPPO TUAS LK STN

  • hex id 2298 (Southern edge High-Low): OPP HAW PAR VILLA STN, HAW PAR VILLA STN, OPP HAW PAR VILLA STN, HAW PAR VILLA STN

  • hex id 5154 (Eastern edge High-Low only on weekends): TANAH MERRY FERRY TER .

    The Tanah Merah Ferry terminal is one of the gateways to Batam and Bintan, Indonesia, as well as Desaru and Tanjung Pengelih, Malaysia.

  • hex id 5133 (Eastern edge High-Low): CHANGI VILLAGE TER, AFT CHANGI GOLF COURSE, CHANGI GOLF COURSE, CHANGI VILLAGE HOTEL, BLK 4, BLK 5

    Changi Village is a place of food and leisure attraction.

  • hex id 5349 (Eastern edge High-Low): POLICE PASS OFF

    This is the SAF Ferry terminal.

Analysis of weekends vs weekdays

A few interesting observations

  • A short stretch of bus stops along Bukit Timah / Dunearn Road near to junction of Clementi Road and PIE exit seems to come alive on weekend afternoons, turning into High-Low outliers.

  • Bus stops near to the Singapore Botanic Gardens also turns into a high-low outliers during weekend afternoons.

  • Bus stops along Jalan Bukit Merah, River Valley Road, Havelock Road and the Rochor areas turn into high clusters only on Weekend Mornings.

  • The woodlands checkpoint area is also observed to be a high cluster only on weekend afternoons, suggesting increased human flow via bus on the weekends.

While this exercise helps us to locate clusters and outliers and we have tried to suggest potential reasons behind the observed spatial pattern, there could be other social and economic factors like income levels, population density and other demographic factors that can play a role in this pattern.

References

Tin Seong Kam. “2 Choropleth Mapping with R” From R for Geospatial Data Science and Analytics https://r4gdsa.netlify.app/chap02

Tin Seong Kam. “9 Global Measures of Spatial Autocorrelation” From R for Geospatial Data Science and Analytics https://r4gdsa.netlify.app/chap09

Tin Seong Kam. “10 Local Measures of Spatial Autocorrelation” From R for Geospatial Data Science and Analytics https://r4gdsa.netlify.app/chap10